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Abstract 

Wc prove a uniformly computable version of de Finetti's theorem on exchangeable 
sequences of real random variables. As a consequence, exchangeable stochastic 
processes in probabilistic functional programming languages can be automatically 
rewritten as procedures that do not modify non-local state. Along the way, we 
prove that a distribution on the unit interval is computable if and only if its 
moments are uniformly computable. 
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1. Introduction 

The classical de Finetti theorem states that an exchangeable sequence of real 
random variables is a mixture of independent and identically distributed (i.i.d.) 
sequences of random variables. Moreover, there is an (almost surely unique) 
measure-valued random variable, called the directing random measure, condi- 
tioned on which the random sequence is i.i.d. The distribution of the directing 
random measure is called the de Finetti measure or the mixing measure. 

This paper examines the computable probability theory of exchangeable se- 
quences of real- valued random variables. We prove a uniformly computable ver- 
sion of de Finetti's theorem, which implies that computable exchangeable se- 
quences of real random variables have computable de Finetti measures. The 
classical proofs do not readily effectivize; instead, we show how to directly com- 
pute the de Finetti measure (as characterized by the classical theorem) in terms 
of a computable representation of the distribution of the exchangeable sequence. 
Along the way, we prove that a distribution on [0, l]'^ is computable if and only 
if its moments are uniformly computable, which may be of independent interest. 

A key step in the proof is to describe the de Finetti measure in terms of the 
moments of a set of random variables derived from the exchangeable sequence. 



When the directing random measure is (almost surely) continuous, we can show 
that these moments are computable, which suffices to complete the proof of the 
main theorem in this case. In the general case, we give a proof inspired by a 
randomized algorithm which succeeds with probability one. 

1.1. Computable Probability Theory 

These results are formulated in the Turing-machine-based bit-model for com- 
putation over the reals (for a general survey, see Braverman and Cook [1]). This 
computational model has been explored both via the type-2 theory of effectiv- 
ity (TTE) framework for computable analysis, and via effective domain-theoretic 
representations of measures. 

Computable analysis has its origins in in the study of recursive real functions, 
and can be seen as a way to provide "automated numerical analysis" (for a 
tutorial, see Brattka, Hertling, and Weihrauch [2]). Effective domain theory has 
its origins in the semantics of programming languages, where it continues to have 
many applications (for a survey, see Edalat [3]). Here we use methods from 
these approaches to transfer a representational result from probability theory to 
a setting where it can directly transform statistical objects as represented on a 
computer. 

The computable probability measures in the bit-model coincide with those 
distributions from which we can generate exact samples to arbitrary precision 
on a computer. As such, our results have direct implications for programming 
languages which manipulate probability measures on real numbers via exact in- 
terfaces. In many areas of statistics and computer science, especially machine 
learning, one is often concerned with distributions on data structures that are 
higher-order or are defined using recursion. Probabilistic functional program- 
ming languages provide a convenient setting for describing and manipulating 
such distributions. 

Exchangeable sequences play a fundamental role in both statistical models 
and their implementation on computers. Given a sequential description of an 
exchangeable process, in which one uses previous samples or sufficient statistics 
to sample the next element in the sequence, a direct implementation in these 
languages would need to use non-local communication (to record new samples 
or update sufficient statistics). This is often implemented by modifying the pro- 
gram's internal state directly (i.e., using mutation), or via some indirect method 
such as a state monad. The classical de Finetti theorem implies that (for such 
sequences over the reals) there is an alternative description in which samples are 
conditionally independent (and so could be implemented without non-local com- 
munication), thereby allowing parallel implementations. But the classical result 
does not imply that there is a program which computes the sequence according to 
this description. Even when there is such a program, the classical theorem does 
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not provide a method for finding it. The computable de Finetti theorem states 
that such a program does exist. Moreover, the proof itself provides the method for 
constructing the desired program. In Section 6 we describe how an implemen- 
tation of the computable de Finetti theorem uniformly transforms procedures 
which induce exchangeable stochastic processes into equivalent procedures which 
do not modify non-local state. 

This transformation is of interest beyond its implications for programming 
language semantics. In statistics and machine learning, it is often desirable to 
know the representation of an exchangeable stochastic process in terms of its 
de Finetti measure (for several examples, see Section 6.3). Many such processes 
in machine learning have very complicated (though computable) distributions, 
and it is not always feasible to find the de Finetti representation by hand. The 
computable de Finetti theorem provides a method for automatically obtaining 
such representations. 

2. de Finetti's Theorem 

We assume familiarity with the standard measure-theoretic formulation of 
probability theory (see, e.g., Billingsley [4] or Kallenberg [5]). Fix a basic proba- 
bility space (O, J^, F) and let Bn denote the Borel sets of R. Note that we will use 
Lv to denote the set of nonnegative integers (as in logic), rather than an element 
of the basic probability space 0, (as in probability theory). By a random measure 
we mean a random element in the space of Borel measures on R, i.e., a kernel 
from (J1,.F) to (R, ;Sr,). An event A G is said to occur almost surely (a.s.) if 
¥A = 1 . We denote the indicator function of a set B by 1b- 

Definition 2.1 (Exchangeable sequence). Let X = {Xi}i>i be a sequence 
of real random variables. We say that X is exchangeable if, for every finite 
set {ki, . . . ,kj} of distinct indices, {X^-^^, . . . , X/..) is equal in distribution to 



Theorem 2.2 (de Finetti [6, Chap. 1.1]). Let X = {Xi}i>i be an exchange- 
able sequence of real-valued random variables. There is a random probability 
measure u onH such that {Xj}j>i is conditionally i.i.d. with respect to u. That 
is, 



(Xi, . . .,Xj). 



f>[X e . I I/] = a.s. 



(1) 



Moreover, v is a.s. unique and given by 




(2) 



where B ranges over Bn. 



□ 
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The random measure v is called the directing random measure. Its distribution (a 
measure on probability measures), which we denote by is called the de Finetti 
measure or the mixing measure. As in Kallenberg [6, Chap. 1, Eq. 3], we may 
take expectations on both sides of (1) to arrive at a characterization 



of an exchangeable sequence as a mixture of i.i.d. sequences. 

A Bayesian perspective suggests the following interpretation: exchangeable 
sequences arise from independent observations from some latent random mea- 
sure. Posterior analysis follows from placing a prior distribution on i'. For fur- 
ther discussion of the implications of de Finetti 's theorem for the foundations of 
statistical inference, see Dawid [7] and Lauritzen [8]. 

In 1931, de Finetti [9] proved the classical result for binary exchangeable 
sequences, in which case the de Finetti measure is simply a mixture of Bernoulli 
distributions; the exchangeable sequence is equivalent to repeatedly flipping a 
coin whose weight is drawn from some distribution on [0, 1]. In the 1950s, Hewitt 
and Savage [10] and Ryll-Nardzewski [11] extended the result to arbitrary real- 
valued exchangeable sequences. We will refer to this more general version as 
the de Finetti theorem. Hewitt and Savage [10] provide a history of the early 
developments, and a discussion of some subsequent extensions can be found in 
Kingman [12], Diaconis and Freedman [13], and Aldous [14]. A recent book by 
Kallenberg [6] provides a comprehensive view of the area of probability theory 
that has grown out of de Finetti's theorem, stressing the role of invariance under 
symmetries. 

2.1. Examples 

Consider an exchangeable sequence of [0, l]-valued random variables. In this 
case, the de Finetti measure is a distribution on the (Borel) measures on [0, 1]. 
For example, if the de Finetti measure is a Dirac measure on the uniform distri- 
bution on [0, 1] (i.e., the distribution of a random measure which is almost surely 
the uniform distribution), then the induced exchangeable sequence consists of 
independent, uniformly distributed random variables on [0, 1]. 

As another example, let p be a random variable, uniformly distributed on 
[0, 1], and let := 5p. Then the de Finetti measure is the uniform distribution on 
Dirac measures on [0, 1], and the corresponding exchangeable sequence is p,p, ... , 
i.e., a constant sequence, marginally uniformly distributed. 

As a further example, we consider the stochastic process {Xi}i>i whose finite 
marginals are given by 




(3) 



F{Xi=Xi, ...,Xn = Xn} 




(4) 



where Sn '■= 'Yli<n^i^ where T is the Gamma function and a,/? are positive 
real numbers. (One can verify that these marginals satisfy Kolmogorov's exten- 
sion theorem [5, Theorem 6.16], and so there is a stochastic process {Xi\i>i with 
these finite marginals.) Clearly this process is exchangeable, as n and Sn are 
invariant to order. This process can also be described by a sequential scheme 
known as Polya's urn [15, Chap. 11.4]. Each Xi is sampled in turn according to 
the conditional distribution 

P{X„+i = l |Xi = xi, ...,X„ = x„}= "7^" . (5) 

This is often described as a urn which starts with a red balls and (3 black balls, 
where at each stage, a ball is drawn uniformly at random, and is returned to the 
urn along with an additional ball of the same color. By de Finetti's theorem, there 
exists a random coin weight 9 with respect to which this process is conditionally 
independent and P{Xj = 1\ 0} = for each i. In fact, 

P[Xi =xi, ...,Xn = Xn\e]= n,-<nIP[^* = Xi\e]= 0^"(1 - 

(6) 

Furthermore, one can show that here 6 is Beta(a, /3)-distributed, and so the 
process given by the marginals (4) is called the Beta-Bernoulli process. Here the 
de Finetti measure is the distribution of the Bernoulli measure with random coin 
weight 6. 

2.2. The Computable de Finetti Theorem 

In each of these examples, the de Finetti measure is a computable measure. 
(In Section 3, we make this and related notions precise. For an explicit com- 
putable representation of the Beta-Bernoulli process, see in Section 6.) A natural 
question to ask is whether computable exchangeable sequences always arise from 
computable de Finetti measures. In fact, computable de Finetti measures give 
rise to computable distributions on exchangeable sequences (see Proposition 5.1). 
Our main result is the converse: every computable distribution on real-valued ex- 
changeable sequences arises from a computable de Finetti measure. 

Theorem 2.3 (Computable de Finetti). Let x the distribution of a real- 
valued exchangeable sequence X, and let (j. be the distribution of its directing 
random measure v. Then /j, is uniformly computable in x, <ind x is uniformly 
computable in fi. In particular, x is computable if and only if fj, is computable. 

The directing random measure is classically given a.s. by the explicit limiting 
expression (2). Without a computable handle on the rate of convergence, the limit 
is not directly computable, and so we cannot use this limit directly to compute the 
de Finetti measure. However, we are able to reconstruct the de Finetti measure 
using the moments of a set of derived random variables. 
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2.2.1. Outline of the Proof 

Recall that denotes the Borel sets of R. Let Xr denote the set of open 
intervals, and let Xq denote the set of open intervals with rational endpoints. 
Then Xq C Xr C ^Sr. For /c > 1 and /3 G 5^ = Sr x • • • x Br, we write f3{i) to 
denote the ith. coordinate of /3. 

We will denote by ^r* the algebra generated by (i.e., finite unions of open 
rectangles in R'^), and denote by the algebra generated by Xq. Note that 

C C ^R C Sr. 

Let X = {Xi}i>i be an exchangeable sequence of real random variables, with 
distribution x ^-nd directing random measure v. For every 7 G Br, we define a 
[0, l]-valued random variable Vj := vy. A classical result in probability theory 
[5, Lem. 1.17] implies that a Borel measure on R is uniquely characterized by 
the mass it places on the open intervals with rational endpoints. Therefore, the 
distribution of the stochastic process {Vr}T&ici determines the de Finetti measure 
/i (the distribution of u) . 

Definition 2.4 (Mixed moments). Let C C Br. The mixed moments of the 
variables {V^}/3eC) are the set of all expectations IE(n^=i ^(i))' for > 1 and 

We can now restate the consequence of de Finetti's theorem described in Eq. (3), 
in terms of the finite-dimensional marginals of the exchangeable sequence X and 
the mixed moments of {V/sj/sgBR- 

Corollary 2.5. P(nti{^i G /9(0}) = Hllti ^/3(i)) fork>l and /3 G S^. □ 

As we will show in Lemma 3.5, when x is computable, we can enumerate all 
rational lower bounds on quantities of the form 

ip(nti{^.G^«}), (7) 

where a G ^q . We can also enumerate rational upper bounds on (7), provided 
that X places no mass on the boundary of any u{i). In particular, if u is a.s. 
continuous (i.e., with probability one, ^{{x}) = for every x G R), then we can 
use X to compute the mixed moments of {Vt^t&Aq- 

In Section 4, we show how to computably recover a distribution from its mo- 
ments. This suffices to recover the de Finetti measure when v is a.s. continuous, 
as we show in Section 5.1. In the general case, fixed point masses in v prevent 
us from computing the mixed moments. Here we use a proof inspired by a ran- 
domized algorithm which almost surely avoids the point masses and recovers the 
de Finetti measure. For the complete proof, see Section 5.3. 
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3. Computable Representations 

Before beginning the proof of the computable de Finetti theorem, we first de- 
fine computable probability measures on various spaces. These definitions follow 
from more general TTE notions, though we will sometimes derive simpler equiv- 
alent representations for the concrete spaces we need (such as the real numbers, 
Borel measures on reals, and Borel measures on Borel measures on reals). For 
details, see the original papers, as noted. 

We assume familiarity with the standard notions of computability theory and 
computably enumerable (c.e.) sets (see, e.g., Rogers [16] or Soare [17]). Recall 
that r G R is called a c.e. real when the set of all rationals less than r is a c.e. 
set. Similarly, r is a co-c.e. real when the set of all rationals greater than r is c.e. 
A real r is a computable real when it is both a c.e. and co-c.e. real. 

To represent more general spaces, we work in terms of an effectively presented 
topology. Suppose that 5 is a second-countable Tq topological space with subbasis 
S. For every point in x G S, define the set Cx ■= {B £ S : x £ B}. Because S 
is To, we have Cx / Cy when x / y, i.e., x is the unique point that is contained 
in those subbasis sets that are elements of Cx and no others. Since every point 
in a To space is determined by the subbasis in this way, it is convenient to define 
representations on topological spaces under the assumption that the space is To. 
In the specific cases below, we often have much more structure, which we use to 
simplify the representations. 

We now develop these definitions more formally. 

Definition 3.1 (Computable topological space). Let 5 be a second-count- 
able To topological space with a countable subbasis S. Let s : uj ^ S he an 
enumeration of S. We say that 5 is a computable topological space (with respect 
to s) when the set 

{{A,B) : A = B} (8) 

is c.e. in terms of the s-indices for A,B £ S, i.e., 

{{a,b) : s{a) = s{b)} 
is a c.e. subset of uj, where ( • , • ) is the standard pairing function. 

Computable topological spaces, as defined here, are instances of the com- 
putable To spaces defined in Grubba, Schroder, and Weihrauch [18, §3], when the 
subbasis is replaced by the basis it generates (and the enumeration is extended 
in a canonical way). 

It is often possible to pick a subbasis S (and enumeration s) for which the 
elemental "observations" that one can computably observe are those of the form 
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(9) 



X G B, where B £ S. Then the set Cx = {B £ S : x £ B} is computably 
enumerable (with respect to s) when the point x is such that it is eventually 
noticed to be in each basic open set containing it; we will call such a point x 
computable. This is one motivation for the definition of computable point in a Tq 
space below. 

Note that in a Ti space, two computable points are computably distinguish- 
able, but in a Tq space, computable points will be, in general, distinguishable only 
in a computably enumerable fashion. However, this is essentially the best that is 
possible, if the open sets are the those we can "observe" . (For more details on this 
approach to considering datatypes as topological spaces, in which basic open sets 
correspond to "observations", see Battenfeld, Schroder, and Simpson [19, §2].) 
Note that the choice of topology and subbasis are essential; for example, we can 
recover both computable reals and c.e. reals as instances of "computable point" 
for appropriate computable topological spaces, as we describe in Section 3.1. 

Definition 3.2 (Computable points). Let (S, 5) be a computable topological 
space with respect to an enumeration s. We say that a point x G S is computable 
( with respect to s) when the set 



is c.e. (in terms of the s-indices for B £ S). We call the set (10) the representation 
of x. 

Suppose that A and B are computable objects (either points, as defined here, 
or computable functions or measures, as defined below). We will say that A is 
uniformly computable in B, or that A is uniformly computable relative to B, when 
there is a single program (even as B varies) that computes the representation 
of A using the representation of B as an oracle (and similarly for computable 
enumerability) . These representations, in turn, can be made into explicit subsets 
of uj using the relevant enumerations, as in (9). 

3.1. Representations of Reals 

We will use both the standard topology and right order topology on the real 
line R. The reals under the standard topology are a computable topological space 
using the basis Tq with respect to the canonical enumeration. The reals under 
the right order topology are a computable topological space using the basis 



under the canonical enumeration. Note that the computable points of (R,Xq) 
are precisely the computable reals, and the computable points of (R,7^<) are 
precisely the c.e. reals. 



{B eS : x£ B} 



(10) 



7^< : 



{(c,oo) : c G Q} 
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3.2. Continuous Real Functions 

We now consider computable representations for continuous functions on the 
reals. 

Definition 3.3 (Computable real function). Let S and T each be either of 
Tq or 7^<, under their canonical enumeration s and t. Let A denote the closure 
of the set A. Fix k > I. We say that a continuous function / : (R'^, 5^^) (R, T) 
is computable when 

{iA,B)eS''xT:f{A)QB} (12) 

is a c.e. set (in terms of the s'-indices for A & and t-indices for B ^ T, 
where s' is the canonical enumeration determined by s). We call the set (12) the 
representation of /. 

This definition is computably equivalent to the canonical construction of com- 
putable functions between two represented spaces [20, Ch. 6]. Note that when 
S = T = 2q, this recovers the standard definition of a computable real func- 
tion. When S = Tq and T = 7^<, this recovers the standard definition of a 
lower-semicomputable real function [21]. 

3.3. Representations of Borel Probability Measures 

The following representations for probability measures on computable topo- 
logical spaces are devised from more general TTE representations in Schroder [22] 
and Bosserhoff [23] , and agree with Weihrauch [24] in the case of the unit interval. 
In particular, the representation for ^^1(5") below is admissible with respect to 
the weak topology, hence computably equivalent (see Weihrauch [20, Chap. 3]) 
to the canonical representation for Borel measures given in Schroder [22]. 

Schroder [22] has also shown the equivalence of this representation for proba- 
bility measures (as a computable space under the weak topology) with probabilis- 
tic processes. A probabilistic process (see Schroder and Simpson [25]) formalizes 
the notion of a program which uses randomness to sample points in terms of their 
representations of the form (10). 

Let Aii(S) denote the set of Borel probability measures on a second-countable 
To topological space S (i.e., the probability measures on the tr-algebra generated 
by S). Provided that the subbasis S is closed under finite intersection, such 
measures are determined by the measure they assign to elements of S. Note that 
M.i{S) is a second-countable Tq space. 

Let 5 be a computable topological space. We will describe a subbasis for 
M.i{S) which makes it a computable topological space. Let As denote the algebra 
generated by the subbasis 5 of S (i.e., the closure of S under finite union and 
complementation). Then, the class of sets 



{7 G7Wi(R) : 7f7 > q}, 
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(13) 



where a G As and g G Q, is a subbasis for the weak topology on M.i{S). An 
effective enumeration of this subbasis can be constructed in a canonical fash- 
ion from the enumeration of S and the rationals, making Mi{S) a computable 
topological space. 

Definition 3.4 (Computable distribution). Let S* be a computable topolog- 
ical space with respect to s, and let As be the algebra generated by the subbasis 
S, under its canonical enumeration s' induced by s. A Borel probability measure 
r] G Mi{S) is computable (with respect to s) when rjB is a c.e. real, uniformly 
in the s'-index of S G As- We call the set {{B,q) e As x Q ■ vB > q} the 
representation of r]. 

In particular, this implies that the measure of a c.e. open set (i.e., the c.e. 
union of basic open sets) is a c.e. real (uniformly in the enumeration of the terms 
in the union), and that the measure of a co-c.e. closed set (i.e., the complement 
of a c.e. open set) is a co-c.e. real (similarly uniformly); see, e.g., [26, §3.3] 
for details. Note that on a discrete space, where singletons are both c.e. open 
and co-c.e. closed, the measure of each singleton is a computable real. But for a 
general space, it is too strong to require that even basic open sets have computable 
measure (as it is more than is needed to ensure that exact samples be computably 
described to arbitrary accuracy). 

We will be interested in computable measures rj G Mi{S) where S is either 
R'^, [0, 1]*^, or A^i(R), topologized as described below. 

3.3.1. Measures on Real Vectors and Sequences under the Standard Topology 

Consider R'^ under the product topology, where R is endowed with its stan- 
dard topology. Note that R'^ is a computable topological space with respect 
to the canonical enumeration on the basis given by the cylinders a x R'^ for 
a G Ufc>i-^Q- Using Definition 3.4, we can characterize the class of computable 
distributions on real sequences. 

Let X = {xi}i>i be a sequence of real-valued random variables (e.g., the ex- 
changeable sequence X, or the derived random variables {VrlreXq under the 
canonical enumeration of Tq). Thus, the joint distribution ry of x is computable 
when7/(crxR'^) = P{x G crxR'^} is a c.e. real, uniformly in o" G A^k . The follow- 
ing simpler representation was shown to be equivalent by Miiller [27, Thm. 3.7]. 



Lemma 3.5 (Computable distribution under the standard topology). 

Let X = {xj}j>i he a sequence of real-valued random variables with joint distri- 
bution r]. Then r] is computable if and only if 

,?(rxR-)=P(nti{x,Gr(i)}) (14) 

is a c.e. real, uniformly in k > 1 and r G Tq. □ 
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Therefore knowing the measure of the sets in Ufc-^Q £ Ufc-^Q*^ sufficient. Note 
that (14) is precisely the form of the first expression in Corollary 2.5. Note also 
that one obtains a characterization of the computability of a finite-dimensional 
vector by embedding it in an initial segment of a sequence. 

3.3.2. Measures on Real Vectors and Sequences under the Right Order Topology 
Distributions on c.e. reals play an important role when representing measures 

on measures, because, as Definition 3.4 indicates, measures are themselves rep- 
resented by collections of c.e. reals lower-bounding the measure of an element of 
an algebra. The set 7?.^ is a basis for the product of the right order topology on 
R*' that makes (R^7^^) is a computable topological space (under the canonical 
enumeration of TZ^ ) . 

Corollary 3.6 (Computable distribution under the right order topology). 

For m,k > 1, let w = {wi, . . . , Wk) he a random vector in R^, and let C = {cij) G 
Qmxfc_ The joint distribution of w is computable under the right order topology 
when P(|J™^]^ Plj^j^lwj > Cij}) is a c.e. real, uniformly in C. □ 

Note that the joint distribution of a random vector in R*^ is computable under 
the right order topology when all its restrictions to R'^ are computable under 
the right order topology, uniformly in k. Note also that if a distribution on R'^ 
is computable under the standard topology, then it is clearly computable under 
the right order topology. The above representation is used in the next section as 
well as in Proposition 5.1, where we must compute an integral with respect to a 
topology that is weaker than the standard topology. 

3.3.3. Measures on Borel Measures 

The de Finetti measure is the distribution of the directing random measure 
an A1i(R)-valued random variable. Recall the definition T/g := for /3 G B^. 
Then ^ is computable (with respect to an enumeration s of the weak topology) 
when 

M(UI^iati{7 G -Mi(R) : 7^(J) > Q,}) = P(U;ii H .tii^.O) > c.,}) 

(15) 

is a c.e. real, uniformly in cr G and C = (cjj) G Q'"^'^. As an immediate 
consequence of (15) and Corollary 3.6, we obtain the following representation for 
the de Finetti measure. 

Corollary 3.7 (Computable de Finetti measure). The de Finetti measure 
fj, is computable if and only if the joint distribution of {^jre^q ^■^ computable 
under the right order topology. □ 
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3.3.4- Integration 

The following lemma is a restatement of an integration result by Schroder 
[22, Prop. 3.6], which itself generalizes integration results on standard topologies 
of finite-dimensional Euclidean spaces by Miiller [27] and the unit interval by 
Weihrauch [24]. 

Define 

I := {An [0,1] : AelQ}, (16) 
which is a basis for the standard topology on [0, 1], and define 

I< := {An [0,1] : A G n<}, (17) 
which is a basis for the right order topology on [0, 1]. 

Lemma 3.8 (Integration of bounded lower-semicomputable functions). 

Let k > 1 and let S be either Tq or . Let 

/:(R^5'=)-([0,1],I<) (18) 

be a computable function and let ^ be a computable distribution on (R^,5'^). 
Then 

'fdfi (19) 

is a c.e. real, uniformly in f and fi. □ 

The following result of Miiller [27] is an immediate corollary. 

Corollary 3.9 (Integration of bounded computable functions). Let 

5:(R^X^)-([0,1],I) (20) 

be a computable function and let fi be a computable distribution on (R'^^Tq). 
Then 



J: 



gdfi (21) 
is a computable real, uniformly in g and fi. □ 
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4. The Computable Moment Problem 



One often has access to the moments of a distribution, and wishes to recover 
the underlying distribution. Let x = (xjjjg,^ be a random vector in [0, 1]"^ with 
distribution ij. Classically, the distribution of x is uniquely determined by the 
mixed moments of x. We show that the distribution is in fact computable from 
the mixed moments. 

One classical way to pass from the moments of x to its distribution is via the 
Levy inversion formula, which maps the characteristic function (f)g : R'^ — C, 
given by 

<A^(t)=E(e*<*'-^)), (22) 

to the distribution of x. However, even in the finite- dimensional case, the in- 
version formula involves a limit for which we have no direct handle on the rate 
of convergence, and so the distribution it defines is not obviously computable. 
Instead, we use computable versions of Urysohn's lemma and the Weierstrass 
approximation theorem to compute a representation of a distribution from its 
moments. 

To show that rj is computable, it suffices to show that r]{a x [0, 1]'^) = 
E(1o-(xi, . . . is a c.e. real, uniformly in cr G Ufe>i^Q- We begin by showing 

how to build sequences of polynomials that converge pointwise from below to 
indicator functions of the form l^- for a G Ua;>i -^q*- 

Lemma 4.1 (Polynomial approximations). There is a computable army 

[pn,a : neuj, a e Ufe>i'^Q'=} (23) 

of rational polynomials where, for n £ lo and a G AQk, the polynomial pn,a is in 
k variables and, for each x € [0, 1]*^, we have 

-'i- < Pn,aix) < laix) and lim Pm,crix) = laix). (24) 

m— »oo 

Proof. Fix n & to and a G AQk. Let Un,a denote the set of x G a for which the 
1/n-ball centered at x is contained in a. We will construct (using a procedure 
that is uniform in n and a) a rational polynomial Pn,a which satisfies 

-1 < PnA^) < 1<t(x) (25) 
for all X G [0, 1]'^, and which satisfies 

la{x)-PnA^)<'^/n (26) 
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for all X G Un^a and for all x E [0, 1]^ \ a. Then the values {pn,a{x)}n>i all equal 
la{x) on X G [0, 1]*^ \ 0", and also equal lo-(x) for x on an increasing sequence of 
subsets of a whose union is a. 

For n = 1 we may take pi^a = 0. Thus assume n > 2. Using a computable 
version of Urysohn's lemma (see Weihrauch [28]), we can find (uniformly in n 
and a) a computable real function fn,a which equals 1 — 1/n for all x S Un,a, 
equals — 1/n on x G [0, l]'^ \ a, and is in between these two values on ex \ C/n.o-- 

Then by the effective Weierstrass approximation theorem (see Pour-El and 
Richards [29, p. 45]), we can find (uniformly in n and a) a polynomial pn^a with 
rational coefficients which uniformly approximates fn,a to within l/2n on [0, 1]^'. 
This polynomial pn^a has the desired properties. □ 

Using these polynomials, we can compute the distribution from the moments. 
The other direction follows from computable integration results. 

Theorem 4.2 (Computable moments). Let x = (xj)jgi^ be a random vector 
in [0, 1]'^ with distribution rj. Then rj is uniformly computable in the mixed mo- 
ments of {xi}i,=i^, and vice versa. In particular, rj is computable if and only if the 
mixed moments of {xi}i(z^ are uniformly computable. 

Proof. Any monic monomial in xi, . . . ,Xk, considered as a real function, com- 
putably maps [0,1]*^ into [0,1] (under the standard topology). Furthermore, r] 
restricts to a computable measure on {xi}i<k- Therefore, by Corollary 3.9, the 
mixed moments are uniformly computable in r] and the index of the monomial. 

To prove that r] is computable in the mixed moments, we give a proof which 
relativizes to representation of the mixed moments, and so we assume that the 
mixed moments of {xjjjgoj are uniformly computable, without loss of generality. 
Let k > 1 and a G Xq. To establish the computability of rj, it suffices to show 
that 

?7(ax [0,1]") =E(l,x[o,i]-(^)) =IE(la(xi,...,Xfc)). (27) 

is a c.e. real, uniformly in a. By Lemma 4.1, we are given a uniformly computable 
sequence of polynomials {pn,a)n£uj which converge pointwise from below to the 
indicator 1^. By the dominated convergence theorem, 

E{l^{xi, . . . ,Xk)) = SUpE{pn,aixi, . . . ,Xk)). (28) 
n 

The expectation K{^pn^a{xi, . . . , Xk)) is a Q-linear combination of moments, hence 
a computable real. Thus their supremum is a c.e. real. □ 
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5. Proof of the Computable de Finetti Theorem 



In the remainder of the paper, let X be a real- valued exchangeable sequence 
with distribution Xi let v denote its directing random measure, and let ji denote 
the de Finetti measure. 

Classically, the joint distribution of X is uniquely determined by the de Finetti 
measure (see Equation 3). We now show that the joint distribution of X is in 
fact computable from the de Finetti measure. 

Proposition 5.1. The distribution x is uniformly computable in /i. 

Proof. We give a proof that relativizes to fi; without loss of generality, as- 
sume that fi is computable. In order to show that Xi the distribution of X, is 
computable, we must show that IP(ni=i{^« ^ ^(01) is a c.e. real, uniformly in 
cr G Xq. Fix u G Xq. Note that, by Corollary 2.5, 

ip(nti{^. e ^«}) =iE(nti^.»)- (29) 

Let rj be the joint distribution of (K-{i))j<fc and let / : [0, 1]^ ^ [0, 1] be defined 
by 

f{xi,. . . ,Xk) ■.= l\i=iXi- (30) 
To complete the proof, we now show that 

|fdr^ = E{UUv,i^)) (31) 

is a c.e. real. Note that the computability of ^ implies that rj is computable 
under the right order topology. Furthermore, / is order-preserving, and so is a 
continuous (and obviously computable) function from ([0,1]'^,!^) to ([0,1], I<). 
Therefore, by Lemma 3.8, we have that J f drj is a c.e. real. □ 

We will first prove the main theorem under the additional hypothesis that 
the directing random measure is almost surely continuous. We then sketch a 
randomized argument which succeeds with probability one. Finally, we present 
the proof of the main result, which can be seen as a derandomization. 

5.1. Almost Surely Continuous Directing Random Measures 

For A: > 1 and ip G A^, we say that is an X-continuity set when Xi dip{i) 
a.s. for \ <i <k (informally, X places no mass on the boundary of ip). 

Lemma 5.2. Relative to x, the mixed moments o/ {^jre^q ore uniformly c.e. 
reals and the mixed moments o/IT^It-g^q are uniformly co-c.e. reals; in partic- 
ular, i/<T G is an X-continuity set, then the mixed moment IE(n^=i ^o-(i)) 
a computable real. 
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Proof. Without loss of generality, we may assume that x is computable. Let 
> 1 and cr G By Corollary 2.5, 



E(nti^.w) =ip(nti{^.e^w}), (32) 

which is a c.e. real because x is computable. The set cr is a co-c.e. closed set in R'^ 
because we can computably enumerate all r G w4q contained in the complement 
of a. Therefore, 

iE(nti v^) = p(nli{^. e W)}) (33) 

is the measure of a co-c.e. closed set, hence a co-c.e. real. When a is an X- 
continuity set, 

E(nti^.w)=iE(n •=!%)), (34) 

and so the expectation is a computable real. □ 

Proposition 5.3 (Almost surely continuous case). Assume that u is con- 
tinuous with probability one. Then /i is uniformly computable in x- 

Proof. We give a proof that relativizes to x! assume without loss of generality 
that X is computable. Let k>l and consider a G ^q. The almost sure continu- 
ity of V implies that Xi da{i) a.s., i.e., a is an X-continuity set. Therefore, by 
Lemma 5.2, the moment ]E(n(Li ^(i)) is a computable real. The computable mo- 
ment theorem (Theorem 4.2) then implies that joint distribution of the variables 
{Vt}t&Aci computable under the standard topology, and so /x is computable 
under the weaker right order topology. By Corollary 3.7, this implies that ^ is 
computable. □ 

5.2. "Randomized" Proof Sketch 

In general, the joint distribution of {Kr}o-g^p is not computable under the 
standard topology because the directing random measure v may, with nonzero 
probability, have a fixed point mass on a rational. In this case, the mixed mo- 
ments of {Vr^reAci co-c.e., reals (relative to x)- Iii this case, 
the computable moment theorem (Theorem 4.2) is inapplicable. For arbitrary di- 
recting random measures, we give a proof of the computable de Finetti theorem 
which works regardless of the location of point masses. 

Consider the following sketch of a "randomized algorithm" : We independently 
sample a countably infinite set of real numbers A C R from a computable, 
absolutely continuous distribution which has support everywhere on the real line 
(e.g., a Gaussian or Cauchy). Let denote the algebra generated by open 
intervals with endpoints in A. Note that, with probability one (over the draw), 
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A will be dense in R and i^(A) = almost surely, and so X will place no mass 
on the boundary of any xjj G suppose we have chosen such an A. Now we 
can proceed analogously to the case where v is almost surely continuous, relative 
to an oracle A which encodes A. 

We begin by proving an extension of Lemma 5.2 that shows that the mixed 
moments of variables in terms of the new basis are ^-computable, relative to x- 

Definition 5.4. We call -0 G a refinement of <^ G A^, and write tp <i ip, 
when 



^(i) C (35) 
for all i < k. 

Lemma 5.5. Let A be sampled as above. Relative to Xj the mixed moments of 
{y^}^g_4^ are uniformly A-c.e. reals and the mixed moments of {V^j^g^^ ^'"^ 
uniformly A-co-c.e. reals. In particular, if if) & A')^ is an X -continuity set, then 
the mixed moment ^{Y[i=i ^(«)) '^'^ A-computable real relative to x- 

Proof. Fix G .4^. All computability claims are relative to x- We may 
A-compute a sequence 

(71,(72,... G (36) 

such that for each n > 1, 

(7„ < (7„,+i and Um^m ='0. (37) 
For each n, 

E(nti^.„w) (38) 

is a c.e. real, and so their supremum is an ^-c.e. real. 

If C) V G satisfy C O then < Vtp (a.s.). Multiplication is continuous, 
and so the dominated convergence theorem gives us 

IE(nti V^Pi^)) = sup„E(nti (39) 

which we have already noted is an A-c.e. real. The co-c.e. result follows similarly 
from a sequence of nested unions of rational intervals whose intersection is ip. 
The final claim is immediate as in Lemma 5.2. □ 

The proof of the computable moment theorem (Theorem 4.2) relativizes, and 
so the joint distribution of {Vip}ip£AA ^-computable in x- This joint dis- 
tribution also classically determines the de Finetti measure. Moreover, we can 
74-compute (relative to x) tlis desired representation with respect to the original 
basis. 
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Lemma 5.6. Let A he sampled as above. The de Finetti measure /i is uniformly 
A-computable in x- 

Proof. We show that the joint distribution of {VVlreylq is A-computable under 
the right order topology. Let m,k >1, let C = {cij) G Q™^'^, and let r G ^q. 
Note that r is an A-c.e. open set with respect to the basis A\^, and so we can A- 
computably enumerate a sequence cti, (T2, . . . G such that U„(Tn = Without 
loss of generality, we may assume that (T„, ^ CTn+i- Note that < ^T(j) (a.s.) 

for all n > 1 and j < k. By the continuity of P, 

p(ui^in-=i{Ko) > c.,}) =sup„p(u:^in-=i{^.„o) > c.,}). (4o) 

The right hand side is the supremum of a uniformly j4-computable set of A- 
computable reals, relative to Xi hence an j4-c.e. real relative to x- ^ 

If we consider A to be a random oracle, then the representation of /i that was 
computed (relative to x) using A is also a random variable. However, while A 
is almost surely noncomputable, the random representation of is almost surely 
constant. Recall that the distribution of the oracle A is computable because we 
started with a computable distribution on A. Hence we could compute fi by 
composing the distribution of A with the (almost surely constant) representation 
of fj.. 

A proof along these lines could be made precise using a simulation argument 
to calculate the pushforward measure of the random representation of fi. The 
basis elements assigned positive probability (and therefore probability one) are 
precisely those which hold of /i. Instead, in Section 5.3, we complete the proof by 
explicitly computing the representation of /i in terms of our rational basis. This 
construction can be seen as a "derandomization" of the above algorithm. 

Alternatively, the above sketch could be interpreted as a degenerate probabilis- 
tic process (see Schroder and Simpson [25]) which returns a representation of the 
de Finetti measure with probability one. Schroder [22] shows that representations 
in terms of probabilistic processes are computably reducible to representations of 
computable distributions. 

The structure of the derandomized argument occurs in other proofs in com- 
putable analysis and probability theory. Weihrauch [24, Thm. 3.6] proves a com- 
putable integration result via an argument that could likewise be seen as a deran- 
domization of an algorithm which densely subdivides the unit interval at random 
locations to find continuity sets. Bosserhoff [23, Lem. 2.15] uses a similar argu- 
ment to compute a basis for a computable metric space, for which every basis 
element is a continuity set; this suggests an alternative approach to completing 
our proof. Miiller [27, Thm. 3.7] uses a similar construction to find open hyper- 
cubes such that for any e > 0, the probability on their boundaries is less than e. 
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These arguments also resemble the classic proof of the Portmanteau theorem [5, 
Thm. 4.25], in which an uncountable family of sets with disjoint boundaries is 
defined, almost all of which are continuity sets. 

5.3. "Derandomized" Construction 

Fix m,k>l and let C = {dj) e Q'"^'=. Define 

Ic: [0,1]'= ^[0,1] (41) 

to be the indicator function for the set 

UI^i(cii,l] X ••• X (c,fc,l]. (42) 

For n G uj, we denote by Pn,c the polynomial pn,a (as defined in Lemma 4.1), 
where 

f^:=UIli(c»i,2) X ••• X (cifc,2) e^Q.. (43) 

Here, we arbitrarily chose 2 > 1 so that the sequence of polynomials {pn,c}neuj 
converges pointwise from below to Ic on [0, l]'^. 

Let X = {xi, . . . , Xk) and y = (yi, . . . , y^). We can write 

Pn,c{x) = p+f;(x) - p'^ci^)' (44) 

where and p~(j are polynomials with positive coefficients. Define the 2k- 
variable polynomial 

qn,c{x,y) := p;[c;(x) -p:^^c(y)- (45) 

We denote 

9n,c(^i/'(i)' • • • ' ^V-Cfc)' ^C(i)' • • • ' ^C{fc)) (46) 
by qn,c(ytp, y<:), and similarly with pn,c- 

Proposition 5.7. Let n e uj, let k,m > 1, let a G ^q, and let C G Q"^^'^. 

Then E(7„^c'(V^, 1/^^) is a c.e. real relative to Xj uniformly in n, a and C. 

Proof. By Lemma 5.2, relative to X) each monomial oip'^fjiVa-) has a c.e. real 
expectation, and each monomial oip~(j(y^) has a co-c.e. real expectation, and 
so by the linearity of expectation, 'Eq^^ciYu-, ^) is a c.e. real. □ 

We are now ready to prove the main theorem. 
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Proof of Theorem 2.3 (Computable de Finetti). The distribution x is 
uniformly computable in fi by Proposition 5.1. We now give a proof of the other 
direction, showing that the joint distribution of {V^jo-e^q is computable, relative 
to X) under the right order topology. 

Let k,m> 1, let vr E and let C = (dj) G Q'^^'=. For C G ^r, let 
denote the fe-tuple (V'^(i), • • • , ^c(fc)) and similarly for V^. Take Ic to be defined 
as above in (41). It suffices to show that 

1 n .tiiKo) > Q,}) = Elc(K) (47) 

is a c.e. real relative to Xi uniformly in vr and C. We do this by a series of 
reductions, which results in a supremum over quantities of the form Egn,c(^o-i ^) 
for a G By Proposition 5.7, these quantities are c.e. reals relative to %. 

Note that (42) is an open set in the right order topology on [0, l]'^ and so Ic 
is an order preserving function. In particular, 'd <^,(p G An satisfy ^ <l y?, then 

< V(fi (a.s.), and so 

Elc{V^) < Elc(K) (48) 

for all i/> G such that -0 <l vr. Therefore, by the density of the reals and the 
dominated convergence theorem, we have that 

Elc(K) = sup ElciV^) (49) 

i/)<]7r 

where ranges over A^. Recall that the polynomials {pn,c}new converge point- 
wise from below to Ic in [0, 1]'^. Therefore, by the dominated convergence theo- 
rem, 

ElciV^) = suvEpn,c{V^)- (50) 

n 

Multiplication is continuous, and so the dominated convergence theorem gives us 
IE (nil ^^(o) = sup E (nti ^.w) and (51) 

where a and r range over ^q. Therefore, by the linearity of expectation, 

K<c(^^) = sup Ep+^(T4) and (53) 

a<iip 

^Pnci^rp)= iuf IEp;^(y^). (54) 

' Tt>'ip ' 
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If -0 is an X-continuity set, then = V':;p({j a.s. for i < k, and so 

^Pn,c{Vrp) = ^qn,c{Vrp, Vrp) (55) 

= Eqn,c{V^, (56) 
= Epl^{V^)-Ep-^^,{V^) (57) 
= sup Ep+ciV.) - inf Ep-^iVr) (58) 

= sup Eqn,c(y^,Vr). (59) 

o-<li/'<l''" 

Because u has at most countably many point masses, the ■0 G 2^ which are 
X-continuity sets are dense in Xq. Therefore, 

sup Ep„^C'(K/.) = sup sup Eqn^c{ya,Vr). (60) 

i/)<l7r i/)<]7r (t<Ii/)<It 

Note that {{a, r) : (3i/> <i tt) a <i <i r} = {{a, r) : cr < vr and a < r}. Hence 
sup sup supEqnfi(ya,Vr) = sup sup Eg'„,c'(K, W) • (61) 

i/)<l7r ctOi/j t>i/) (jOtt tC>(T 

Again by dominated convergence we have 

SUpEg„,c(K, Vr) = Eqn,c{Va, Va) ■ (62) 
tOct 

Combining (47), (49), (50), (60), (61), and (62), we have 

Elc(K) = sup supEqn,c{Va, V^). (63) 

Finally, by Proposition 5.7, 

{Eqn,c{V„, V^) : a < IT and n £ u} (64) 
is a set of uniformly c.e. reals, relative to x- D 

6. Exchangeability in Probabilistic Functional Programming Languages 

The computable de Finetti theorem has implications for the semantics of prob- 
abilistic functional programming languages, and in particular, gives conditions 
under which it is possible to remove uses of mutation (i.e., code which modifies 
a program's internal state). Furthermore, an implementation of the computable 
de Finetti theorem itself performs this code transformation automatically. 

For context, we provide some background on probabilistic functional pro- 
gramming languages. We then describe the code transformation performed by 
the computable de Finetti theorem, using the example of the Polya urn and Beta- 
Bernoulli process discussed earlier. Finally, we discuss partial exchangeability and 
its role in recent machine learning applications. 
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6.1. Probabilistic Functional Programming Languages 

Functional programming languages with probabilistic choice operators have 
recently been proposed as universal languages for statistical modeling (e.g., IBAL 
[30], Ao[31], Church [32], and HANSEI [33]). Within domain theory, ideahzed 
functional languages that can manipulate exact real numbers, such as Escardo's 
RealPCF+ [34] and Plotkin's PCF++ [35], have also been extended by prob- 
abilistic choice operators (e.g., by Escardo [36] and Saheb-Djahromi [37]). 

The semantics of probabilistic programs have been studied extensively in the- 
oretical computer science in the context of randomized algorithms, probabilistic 
model checking, and other areas. However, the application of probabilistic pro- 
grams to universal statistical modeling has a somewhat different character from 
much of the other work on probabilistic programming languages. 

In Bayesian analysis, the goal is to use observed data to understand hidden 
variables in a probabilistic model. This type of inductive reasoning, from evidence 
to hypothesis, can be thought of as inferring the hidden states of a program 
that generates the observed output. One speaks of the conditional execution of 
probabilistic programs, in which they are "run backwards" to sample from the 
conditional probability distribution given the observed data. 

One important difference is the algorithms used for conditional inference. 
Goodman et al. [32] describe the language Church, which extends Scheme, and 
which implements approximate conditional execution via Markov chain Monte 
Carlo (which can be thought of as a random walk over computational histories). 
Park, Pfenning, and Thrun [31] describe the language Ao, which extends OCaml, 
and which implements approximate conditional execution by Monte Carlo im- 
portance sampling. Ramsey and Pfeffer [38] describe an implementation of the 
probability monad in Haskell, and develop "measure terms" suitable for imple- 
menting the sum-product algorithm, giving a more efficient method for calculat- 
ing expectations. 

These languages also stress the flexibility of representations. In statistics 
and especially nonparametric statistics, there is an emphasis on higher-order 
distributions (e.g., distributions on distributions, or distributions on trees), and 
so it is crucial to work in a language which can express these types. Functional 
languages with randomness are therefore a natural choice. 

The de Finetti theorem gives two different representations for exchangeable 
sequences, each of which has its own advantages with respect to space and time 
complexity. In Section 6.3 we provide several examples of the representational 
changes provided by the de Finetti transformation. Some of these representa- 
tional issues are also examined in Roy et al. [39]. Mansinghka [40] has also 
described some uses of exchangeable sequences, and situations where one can 
exploit conditional independence for improved parallel execution. 
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6.2. Code Transformations 

We now describe the code transformation performed by the computable de 
Finetti theorem. For concreteness, we wih explore this connection using Church, 
a probabilistic functional programming language. Church extends Scheme (a di- 
alect of LISP) with a binary-valued flip procedure, which denotes the Bernoulli 
distribution. In Church, an expression denotes the distribution induced by eval- 
uation. For example, 

(+ (flip) (flip) (flip)) 

denotes a Binomial(n = 3,p = distribution and 

(A (x) (if (= 1 (flip)} X 0)) 

denotes the probability kernel x i-^ ^{Sx+Sq), where 6r denotes the Dirac measure 
concentrated at the real r. Church is call- by- value and so 

(= (flip) (flip)) 

denotes a Bernoulli distribution on {true, false}, while the application of the 
procedure 

(A (x) (= X x)) 

to the argument (flip), written 

((A (x) (= X x)) (flip)), 

denotes 5i. (For more examples, see [32].) In Scheme, one can modify the state of 
a non-local variable using mutation via the set ! procedure. In other functional 
programming languages, non-local state may be implemented via other methods. 
(For example, in Haskell, one could use the state monad.) If an expression mod- 
ifies its environment (using mutation or otherwise), it might not denote a fixed 
distribution. For example, a procedure may keep a counter variable and return 
an increasing sequence of integers on repeated calls. 

Consider the Bet a- Bernoulli process and the Polya urn scheme written in 
Church. (For their mathematical characterization, see Section 2.1.) While these 
two processes look different, they induce the same distribution on sequences. We 
two define two procedures, sample-beta-coin and sample-polya-coin, such 
that the application of either procedure returns a procedure of no arguments 
whose application returns (random) binary values. Fix a,b > 0. Consider the 
following definitions of sample-beta-coin and sample-polya-coin (and recall 
that the (A ()...) special form creates a procedure of no arguments): 
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(n) 



(define (sample-beta-coin) 
(let ((weight (beta a b))) 
(A (flip weight)) ) ) 



(define (sample-polya-coin) 
(let ((red a) 

(total (+ a b)) ) 



(A (let ((x (flip ^))) 



(set! red (+ red x)) 
(set! total (+ total 1)) 
X ) ) ) 



The definitions 

(define my-beta-coin (sample-beta-coin)) 
(define my-p61ya-coin (sample-polya-coin)) 

define a procedure my-beta-coin such that repeated applications (my-beta-coin) 
induce a random binary sequence, and similarly with my-p61ya-coin. 

Evaluating (my-beta-coin) returns a 1 with probability weight and a 
otherwise, where the shared weight parameter is itself drawn from a Beta(a, b) 
distribution on [0, 1]. Note that the sequence of values obtained by evaluating 
(my-beta-coin) is exchangeable but not i.i.d. (e.g., an initial sequence of ten 
I's leads one to predict that the next draw is more likely to be 1 than 0). How- 
ever, conditioned on the weight (a random variable within the opaque procedure 
my-beta-coin) the sequence is i.i.d. A second random coin constructed by 

(define your-beta-coin (sample-beta-coin)) 

will have a different weight (a.s.) and will generate a sequence that is independent 
of that generated by my-beta-coin. The sequence induced by repeated appli- 
cations of my-beta-coin is exchangeable because applications of flip return 
independent samples. 

The code in (ii) implements the Polya urn scheme with a red balls and b 
black balls (see [15, Chap. 11.4]), and so the sequence of return values from 
my-p61ya-coin is exchangeable. (One can see that the sequence is exchangeable 
by noting that its joint distribution depends only on the number of red and black 
balls and not on their order.) 

Because the sequence induced by repeated applications of my-p61ya-coin is 
exchangeable, de Finetti's theorem implies that its distribution is equivalent to 
that induced by i.i.d. draws from some random measure (in particular, the direct- 
ing random measure). In the case of the Polya urn scheme, the directing random 
measure is a random Bernoulli whose weight parameter has a Beta(a, b) distri- 
bution. Therefore (sample-beta-coin) denotes the de Finetti measure of the 
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sequence of repeated applications of my-beta-coin (and also of your-beta-coin, 
incidentally) . The distributions of the sequences induced by repeated applications 
of my-beta-coin and my-p61ya-coin are identical. 

However, there is an important difference between these two implementations. 
While (sample-beta-coin) denotes the de Finetti measure, (sample-polya-coin) 
does not, because my-p61ya-coin does not denote a fixed distribution: the in- 
ternal state of the procedure my-p61ya-coin changes after each iteration, as the 
sufficient statistics are updated (using the mutation operator set!). Therefore, 
each element of the sequence of repeated applications of my-p61ya-coin is gen- 
erated from a different distribution. Even though the sequence of applications 
of my-p61ya-coin has the same distribution as those given by repeated appli- 
cations of my-beta-coin, the procedure my-p61ya-coin denotes a probability 
kernel which depends on the state. 

In contrast, my-beta-coin does not modify itself via mutation; after it is 
randomly initialized, the value of weight does not change during the execution 
of the program. Therefore, in each run of the program, my-beta-coin denotes 
a fixed distribution — a particular Bernoulli. The procedure my-beta-coin is 
precisely the directing random measure of the Beta-Bernoulli process produced 
by repeated applications of my-beta-coin. 

An implementation of the computable de Finetti theorem (Theorem 2.3) 
transforms (ii) into a procedure which does not use mutation and whose appli- 
cation is equivalent in distribution to the evaluation of (sample-beta-coin). In 
fact, assuming we know that the code generates a binary sequence (and therefore 
that the directing random measure will be a random Bernoulli) , we can transform 
the de Finetti measure into a procedure that does not use mutation and whose 
application is equivalent in distribution to the evaluation of (beta a b). 

In the general case, given a program that generates an exchangeable sequence 
of reals, an implementation of the computable de Finetti proof can generate a 
mutation-free procedure generated-code such that 

(define (sample-directing-random-measure) 
(let ((shared-randomness (uniform 1))) 

(A (generated-code shared-randomness)) ) ) 

defines the de Finetti measure. 

In addition to their simpler semantics, mutation-free procedures are often 
desirable for practical reasons. For example, having sampled the directing random 
measure, an exchangeable sequence of random variables can be efficiently sampled 
in parallel without the overhead necessary to communicate sufficient statistics. 
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6.3. Partial Exchangeability of Arrays and Other Data Structures 

The example above involved binary sequences, but the computable de Finetti 
theorem can be used to transform implementations of real exchangeable se- 
quences. Consider the following exchangeable sequence whose combinatorial 
structure is known as the Chinese restaurant process (see Aldous [14]). Let a > 
be a computable real and let H he a computable distribution on R. For n > 1, 
each Xn is sampled in turn according to the conditional distribution 



The sequence {Xn}n>i is exchangeable and the directing random measure is a 
Dirichlet process with parameter aH. Given such a program, we can automati- 
cally recover the underlying Dirichlet process prior, samples from which produce 
random measures whose discrete structure was characterized by Sethuraman's 
"stick- breaking construction" [41]. Note that the random measure is not pro- 
duced in the same manner as Sethuraman's construction and certainly is not of 
closed form. But the resulting mathematical objects have the same structure and 
distribution. 

Sequences of random objects other than reals can sometimes be given de 
Finetti-type representations. For example, the Indian buffet process, defined 
by Griffiths and Ghahramani [42] , is the combinatorial process underlying a set- 
valued exchangeable sequence that can be written in a way analogous to the Polya 
urn in (ii). Just as the Chinese restaurant process gives rise to the Dirichlet pro- 
cess, the Indian buffet process gives rise to a non-homogeneous Beta-Bernoulli 
process. Unlike in the Chinese restaurant process example, which was a sequence 
of random reals, the computable de Finetti theorem is not applicable to exchange- 
able sequences of random sets. The directing random measure was identified as 
a random Bernoulli process, whose base measure is a Beta process (see Thibaux 
and Jordan [43]). A non-homogenous Bernoulli process is a sequence of inde- 
pendent (but not necessarily identically distributed) Bernoulli random variables 
{Zj}j>i, and the Bernoulli sequence is the random set {n : Z„ = 1}. For the 
Beta-Bernoulli process, the Bernoulli sequence is almost surely finite. A "stick- 
breaking construction" of the Beta-Bernoulli process given by Teh, Goriir, and 
Ghahramani [44] is analogous to the code in (i), but gives only a Ai-index for 
the Bernoulli sequence, rather than its canonical index (see Soare [17, II.2]). 
This observation was first noted by Roy et al. [39]. The computability of sam- 
pling independent Bernoulli sequences of a Beta-Bernoulli process remains open. 
However, an Indian buffet process on a discrete base measure can be classically 
transformed into an exchangeable sequence of integer indices (representing the 
finite subsets of the discrete support). If we are given such a representation, the 



n 




(65) 
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computable de Finetti theorem implies the existence of a computable de Finetti 
measure that gives canonical indices for the sets. 

Combinatorial structures other than sequences have been given de Finetti- 
type representational theorems. For example, an array of random variables 
is called separately (or jointly) exchangeable when its distribution is invariant 
under (simultaneous) permutations of the rows and columns and their higher- 
dimensional analogues. Nearly fifty years after de Finetti's result, Aldous [45] and 
Hoover [46] showed that the entries of an infinite array satisfying either separate 
or joint exchangeability are conditionally i.i.d. These results have been connected 
with the theory of graph limits by Diaconis and Janson [47] and Austin [48] by 
considering the adjacency matrix of an exchangeable random graph. 

As in the case of the Indian buffet process, structured probabilistic models in 
machine learning can often be represented in multiple ways, each of which has dif- 
ferent advantages (e.g., efficient inference procedures, representational simplicity, 
compositionality, etc.). The infinite relational model of Kemp et al. [49] can be 
viewed as a partially exchangeable array, while the hierarchical stochastic block 
model built with the Mondrian process of Roy and Teh [50] is described using the 
Aldous-Hoover representation, making the conditional independence explicit. If 
the computable de Finetti theorem could be extended to partially exchangeable 
settings, it would provide analogous uniform transformations on a wider range of 
data structures. 
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